Authored by: Thakoor Madhuvaishali
Duration: 150 mins
Level: Intermediate
Pre-requisite Skills: Python, Data analysis, Data cleaning, Data visualization
Scenario:
As a business owner or city planner, I want to understand the relationship between business locations (such as bars, taverns, and pubs) and public transport accessibility, so that I can make informed decisions about where to establish new businesses or how to improve public transport infrastructure to support existing businesses. By analyzing this relationship, I can identify areas where better transport options could enhance business patronage and growth, ultimately contributing to the economic development of Melbourne.
What this use case will teach you
At the end of this use case, you will:
Introduction:
Public transport accessibility is a crucial factor that can influence the success and growth of businesses. For bars, taverns, and pubs, easy access to public transport can significantly impact customer patronage, especially for those who prefer not to drive. This use case aims to explore the relationship between business locations and public transport accessibility in Melbourne by integrating and analyzing data from the City of Melbourne's Census of Land Use and Employment (CLUE) dataset (2002-2022) along with Bus Stops and Tram Stops datasets. By overlaying business establishment locations with bus and tram stops, this analysis aims to understand how accessibility to these public transport modes impacts business patronage and growth.
DATASETS LIST:
https://data.melbourne.vic.gov.au/explore/dataset/bars-and-pubs-with-patron-capacity/information/
This dataset contains information about the locations and industry classifications of business establishments in Melbourne, covering the period from 2002 to 2022. It includes details such as the name, address, and classification (e.g., bars, taverns, pubs) of each business.
https://data.melbourne.vic.gov.au/explore/dataset/bus-stops/
This dataset contains information about the locations of bus stops in the area, including their geographic coordinates (latitude and longitude). This information is crucial for helping users navigate to their desired support service locations using public transport.
https://data.melbourne.vic.gov.au/explore/dataset/tram-tracks/
This dataset contains information about the tram tracks in the area, including their routes and stops. Tram tracks are another important aspect of public transport infrastructure that can help users plan their journeys to support service locations.
#importing all the modules
import requests
import pandas as pd
from io import StringIO
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from folium.plugins import HeatMap
import ipywidgets as widgets
from ipywidgets import interact
from IPython.display import display
import numpy as np
#Loading the bars and pubs dataset
base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='bars-and-pubs-with-patron-capacity'
url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}
response=requests.get(url,params=params)
if response.status_code==200:
url_content=response.content.decode('utf-8')
df_business=pd.read_csv(StringIO(url_content),delimiter=';')
print(df_business.head(10))
else:
print(f'Request failed with status code {response.status_code}')
census_year block_id property_id base_property_id \
0 2002 11 108972 108972
1 2002 14 103172 103172
2 2002 15 103944 103944
3 2002 16 103938 103938
4 2002 17 103925 103925
5 2002 33 105937 105937
6 2002 35 102143 102143
7 2002 37 102158 102158
8 2002 42 105392 105392
9 2002 43 108069 108069
building_address clue_small_area \
0 10-22 Spencer Street MELBOURNE 3000 Melbourne (CBD)
1 31-39 Elizabeth Street MELBOURNE 3000 Melbourne (CBD)
2 277-279 Flinders Lane MELBOURNE 3000 Melbourne (CBD)
3 187 Flinders Lane MELBOURNE 3000 Melbourne (CBD)
4 121-123 Flinders Lane MELBOURNE 3000 Melbourne (CBD)
5 435-443 Little Collins Street MELBOURNE 3000 Melbourne (CBD)
6 258-274 Collins Street MELBOURNE 3000 Melbourne (CBD)
7 86-88 Collins Street MELBOURNE 3000 Melbourne (CBD)
8 120-138 King Street MELBOURNE 3000 Melbourne (CBD)
9 131-141 Queen Street MELBOURNE 3000 Melbourne (CBD)
trading_name \
0 Explorers Inn
1 Connells Tavern
2 De Biers
3 Adelphi Hotel
4 Velour
5 The Kitchen Restaurant
6 Australia On Collins Bar
7 Dr Martin's Tavern
8 The Exchange Hotel
9 The Turf Bar & Restaurant
business_address number_of_patrons \
0 10-22 Spencer Street MELBOURNE 3000 50
1 35 Elizabeth Street MELBOURNE 3000 350
2 Unit 1, Basement , 277 Flinders Lane MELBOURNE... 400
3 187 Flinders Lane MELBOURNE 3000 80
4 Unit 1, Gnd & Bmt , 121 Flinders Lane MELBOURN... 350
5 LwrGround , 25 Little Collins Street MELBOURNE... 200
6 Shop 121, Basement , 260 Collins Street MELBOU... 11
7 86A Collins Street MELBOURNE 3000 291
8 Part Ground , 120 King Street MELBOURNE 3000 300
9 Unit 1, LwrGround , 131 Queen Street MELBOURNE... 400
longitude latitude location
0 144.955254 -37.820511 -37.82051068881513, 144.95525416628004
1 144.964322 -37.817426 -37.81742610667125, 144.964321660097
2 144.965307 -37.817242 -37.81724194023457, 144.96530699086
3 144.968385 -37.816360 -37.81635974400108, 144.9683846004515
4 144.970523 -37.815674 -37.815673646380745, 144.97052296371248
5 144.960107 -37.816397 -37.816397070511165, 144.96010676156988
6 144.964919 -37.815415 -37.81541503501324, 144.96491857864146
7 144.970594 -37.813938 -37.8139380932959, 144.97059378480083
8 144.956417 -37.817034 -37.81703382441968, 144.95641679014355
9 144.960543 -37.815497 -37.815496733626624, 144.96054302812774
#Loading the bus stops dataset
base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='bus-stops'
url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}
response=requests.get(url,params=params)
if response.status_code==200:
url_content=response.content.decode('utf-8')
df_bus_stop=pd.read_csv(StringIO(url_content),delimiter=';')
print(df_bus_stop.head(10))
else:
print(f'Request failed with status code {response.status_code}')
geo_point_2d \
0 -37.80384165792465, 144.93239283833262
1 -37.81548699581418, 144.9581794249902
2 -37.81353897396532, 144.95728334230756
3 -37.82191394843844, 144.95539345270072
4 -37.83316401267591, 144.97443745130263
5 -37.79436108568101, 144.92998424529242
6 -37.817452093555325, 144.96168480565794
7 -37.82146476463953, 144.9303191551562
8 -37.837547087144706, 144.98191138368836
9 -37.812490976626215, 144.95370614040704
geo_shape prop_id addresspt1 \
0 {"coordinates": [144.93239283833262, -37.80384... 0 76.819824
1 {"coordinates": [144.9581794249902, -37.815486... 0 21.561304
2 {"coordinates": [144.95728334230756, -37.81353... 0 42.177187
3 {"coordinates": [144.95539345270072, -37.82191... 0 15.860434
4 {"coordinates": [144.97443745130263, -37.83316... 0 0.000000
5 {"coordinates": [144.92998424529242, -37.79436... 0 3.105722
6 {"coordinates": [144.96168480565794, -37.81745... 0 7.239726
7 {"coordinates": [144.9303191551562, -37.821464... 0 32.180664
8 {"coordinates": [144.98191138368836, -37.83754... 0 41.441167
9 {"coordinates": [144.95370614040704, -37.81249... 0 16.143764
addressp_1 asset_clas asset_type objectid str_id \
0 357 Signage Sign - Public Transport 355 1235255
1 83 Signage Sign - Public Transport 600 1231226
2 207 Signage Sign - Public Transport 640 1237092
3 181 Signage Sign - Public Transport 918 1232777
4 0 Signage Sign - Public Transport 1029 1271914
5 112 Signage Sign - Public Transport 1139 1577059
6 268 Signage Sign - Public Transport 1263 1481028
7 298 Signage Sign - Public Transport 2527 1245221
8 78 Signage Sign - Public Transport 2922 1248743
9 99 Signage Sign - Public Transport 5111 1253565
addresspt asset_subt model_desc mcc_id \
0 570648 NaN Sign - Public Transport 1 Panel 1235255
1 548056 NaN Sign - Public Transport 1 Panel 1231226
2 543382 NaN Sign - Public Transport 1 Panel 1237092
3 103975 NaN Sign - Public Transport 1 Panel 1232777
4 0 NaN Sign - Public Transport 1 Panel 1271914
5 616011 NaN Sign - Public Transport 1 Panel 1577059
6 527371 NaN Sign - Public Transport 1 Panel 1481028
7 110521 NaN Sign - Public Transport 1 Panel 1245221
8 107419 NaN Sign - Public Transport 1 Panel 1248743
9 602160 NaN Sign - Public Transport 1 Panel 1253565
roadseg_id descriptio model_no
0 21673 Sign - Public Transport 1 Panel Bus Stop Type 13 P.16
1 20184 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16
2 20186 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16
3 22174 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16
4 22708 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16
5 21693 Sign - Public Transport 1 Panel Bus Stop Type 1 P.16
6 20171 Sign - Public Transport 1 Panel Bus Stop Type 3 P.16
7 30638 Sign - Public Transport 1 Panel Bus Stop Type 3 P.16
8 22245 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16
9 20030 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16
#Loading the tram tracks dataset
base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='tram-tracks'
url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}
response=requests.get(url,params=params)
if response.status_code==200:
url_content=response.content.decode('utf-8')
df_tram_tracks=pd.read_csv(StringIO(url_content),delimiter=';')
print(df_tram_tracks.head(10))
else:
print(f'Request failed with status code {response.status_code}')
geo_point_2d \
0 -37.78861331868605, 144.93461561782556
1 -37.819185576253524, 144.9610352542915
2 -37.81837971233182, 144.95945259116502
3 -37.81440385390417, 144.9702510191494
4 -37.816738613153454, 144.9699088149105
5 -37.83646719875296, 144.98204502621257
6 -37.82412832636803, 144.97049982199576
7 -37.797091788559044, 144.96748110523234
8 -37.79372033756351, 144.95828640079029
9 -37.8348813636736, 144.97472590308539
geo_shape \
0 {"coordinates": [[[[144.934525400489, -37.7886...
1 {"coordinates": [[[[144.96099441333, -37.81917...
2 {"coordinates": [[[[144.959343869512, -37.8182...
3 {"coordinates": [[[[144.969150077924, -37.8147...
4 {"coordinates": [[[[144.970082905454, -37.8167...
5 {"coordinates": [[[[144.982231917059, -37.8357...
6 {"coordinates": [[[[144.970150613164, -37.8234...
7 {"coordinates": [[[[144.967374842424, -37.7971...
8 {"coordinates": [[[[144.958247558775, -37.7938...
9 {"coordinates": [[[[144.974623014262, -37.8348...
descriptio name
0 <center><table><tr><th colspan='2' align='cent... kml_3
1 <center><table><tr><th colspan='2' align='cent... kml_5
2 <center><table><tr><th colspan='2' align='cent... kml_6
3 <center><table><tr><th colspan='2' align='cent... kml_7
4 <center><table><tr><th colspan='2' align='cent... kml_8
5 <center><table><tr><th colspan='2' align='cent... kml_9
6 <center><table><tr><th colspan='2' align='cent... kml_16
7 <center><table><tr><th colspan='2' align='cent... kml_22
8 <center><table><tr><th colspan='2' align='cent... kml_26
9 <center><table><tr><th colspan='2' align='cent... kml_27
The section involves the initial steps taken to understand and prepare the dataset for analysis. I have firstly explored the bar,patron, pub dataset and then proceeded to explore the bus stop and tram tracks dataset which were very similarly structured. The df.info() command provides a summary of the dataset, including the number of entries and columns, as well as the data types of each column. This summary helps in assessing the dataset's overall quality and identifying any missing or incorrect data. Finally, cleaning null values from the columns and removing columns that not related to the analysis ensures that the dataset is more concise, less bulky to manipulate and ready for further analysis
Exploring the bar,pubs and traverns dataset
## Display a summary of the dataset to check for number of null values
df_business.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4696 entries, 0 to 4695 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 census_year 4696 non-null int64 1 block_id 4696 non-null int64 2 property_id 4696 non-null int64 3 base_property_id 4696 non-null int64 4 building_address 4696 non-null object 5 clue_small_area 4696 non-null object 6 trading_name 4696 non-null object 7 business_address 4696 non-null object 8 number_of_patrons 4696 non-null int64 9 longitude 4676 non-null float64 10 latitude 4676 non-null float64 11 location 4676 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 440.4+ KB
# Drop the columns that are not useful for the analysis
df_business = df_business.drop(columns=['location'])
df_business = df_business.drop(columns=['census_year'])
df_business = df_business.drop(columns=['property_id'])
df_business = df_business.drop(columns=['base_property_id'])
df_business = df_business.drop(columns=['number_of_patrons'])
#cleaning the datasets of missing values for latitude,longitude and name as these would be the crucial components for analysis
df_business = df_business.dropna(subset=['latitude', 'longitude', 'trading_name'])
df_business.head(5) # overview of the cleaned dataset
| block_id | building_address | clue_small_area | trading_name | business_address | longitude | latitude | |
|---|---|---|---|---|---|---|---|
| 0 | 11 | 10-22 Spencer Street MELBOURNE 3000 | Melbourne (CBD) | Explorers Inn | 10-22 Spencer Street MELBOURNE 3000 | 144.955254 | -37.820511 |
| 1 | 14 | 31-39 Elizabeth Street MELBOURNE 3000 | Melbourne (CBD) | Connells Tavern | 35 Elizabeth Street MELBOURNE 3000 | 144.964322 | -37.817426 |
| 2 | 15 | 277-279 Flinders Lane MELBOURNE 3000 | Melbourne (CBD) | De Biers | Unit 1, Basement , 277 Flinders Lane MELBOURNE... | 144.965307 | -37.817242 |
| 3 | 16 | 187 Flinders Lane MELBOURNE 3000 | Melbourne (CBD) | Adelphi Hotel | 187 Flinders Lane MELBOURNE 3000 | 144.968385 | -37.816360 |
| 4 | 17 | 121-123 Flinders Lane MELBOURNE 3000 | Melbourne (CBD) | Velour | Unit 1, Gnd & Bmt , 121 Flinders Lane MELBOURN... | 144.970523 | -37.815674 |
#Plotting Business Distribution by Clue Small Area
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
df_business['clue_small_area'].value_counts().plot(kind='bar')
plt.xlabel('Clue Small Area')
plt.ylabel('Number of Businesses')
plt.title('Business Distribution by Clue Small Area')
_ = plt.xticks(rotation=45, ha='right')
The bar chart shows the number of bars, pubs, and taverns in each CLUE small area. This helps in identifying which areas have a high concentration of businesses and which areas have fewer or no such establishments for example we can see that Melbourne(CBD) has the highest number of businesses.
#Function to filter map based on Clue Small Area
def create_map(clue_area):
# Filter the DataFrame based on the selected CLUE area
filtered_df = df_business[df_business['clue_small_area'] == clue_area]
# Create the map centered on the average location
map_center = [filtered_df['latitude'].mean(), filtered_df['longitude'].mean()]
m = folium.Map(location=map_center, zoom_start=13)
# Add markers for each business location
for idx, row in filtered_df.iterrows():
folium.Marker(
location=[row['latitude'], row['longitude']],
popup=f"{row['trading_name']}<br>{row['business_address']}",
tooltip=row['trading_name']
).add_to(m)
# Create a custom legend
legend_html = '''
<div style="position: fixed;
top: 10px; left: 10px;
width: 150px; height: auto;
background-color: white;
border:2px solid grey;
z-index: 9999;
padding: 10px;">
<h4>Legend</h4>
<i class="fa fa-map-marker" style="color:blue"></i> Business Location<br>
</div>
'''
# Add legend to the map
m.get_root().html.add_child(folium.Element(legend_html))
# Display the map
display(m)
# Get the unique CLUE areas for the dropdown
clue_areas = df_business['clue_small_area'].unique()
# Use ipywidgets to create an interactive dropdown
interact(create_map, clue_area=clue_areas)
interactive(children=(Dropdown(description='clue_area', options=('Melbourne (CBD)', 'Carlton', 'Parkville', 'N…
create_map
def create_map(clue_area)
<no docstring>
By visualizing the business locations on a map with a dropdown filter based on CLUE small area, the purpose is to allow users (such as city planners or business owners) to interactively explore specific geographic areas and analyze the concentration of businesses in those regions. This filter helps narrow down the map view to individual areas, making it easier to focus on local business distribution and assess public transport accessibility for each area.
Exploring the bus stop dataset
# Display a summary of the dataset to check for number of null values
df_bus_stop.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 309 entries, 0 to 308 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 geo_point_2d 309 non-null object 1 geo_shape 309 non-null object 2 prop_id 309 non-null int64 3 addresspt1 309 non-null float64 4 addressp_1 309 non-null int64 5 asset_clas 309 non-null object 6 asset_type 309 non-null object 7 objectid 309 non-null int64 8 str_id 309 non-null int64 9 addresspt 309 non-null int64 10 asset_subt 0 non-null float64 11 model_desc 309 non-null object 12 mcc_id 309 non-null int64 13 roadseg_id 309 non-null int64 14 descriptio 309 non-null object 15 model_no 309 non-null object dtypes: float64(2), int64(7), object(7) memory usage: 38.8+ KB
# Ensure the column is of type string
df_bus_stop['geo_point_2d'] = df_bus_stop['geo_point_2d'].astype(str)
# Split 'geo_point_2d' into 'latitude' and 'longitude'
df_bus_stop[['latitude', 'longitude']] = df_bus_stop['geo_point_2d'].str.split(', ', expand=True)
# Convert the new columns to float
df_bus_stop['latitude'] = df_bus_stop['latitude'].astype(float)
df_bus_stop['longitude'] = df_bus_stop['longitude'].astype(float)
# Drop the unrelated columns
df_bus_stop = df_bus_stop[['descriptio', 'model_no', 'model_desc', 'latitude', 'longitude']]
#Display the first few rows of the dataset to get an overview of the data structure and the types of values in each column
df_bus_stop.head(5)
| descriptio | model_no | model_desc | latitude | longitude | |
|---|---|---|---|---|---|
| 0 | Sign - Public Transport 1 Panel Bus Stop Type 13 | P.16 | Sign - Public Transport 1 Panel | -37.803842 | 144.932393 |
| 1 | Sign - Public Transport 1 Panel Bus Stop Type 8 | P.16 | Sign - Public Transport 1 Panel | -37.815487 | 144.958179 |
| 2 | Sign - Public Transport 1 Panel Bus Stop Type 8 | P.16 | Sign - Public Transport 1 Panel | -37.813539 | 144.957283 |
| 3 | Sign - Public Transport 1 Panel Bus Stop Type 8 | P.16 | Sign - Public Transport 1 Panel | -37.821914 | 144.955393 |
| 4 | Sign - Public Transport 1 Panel Bus Stop Type 8 | P.16 | Sign - Public Transport 1 Panel | -37.833164 | 144.974437 |
Exploring the tram tracks dataset
#Display the first few rows of the dataset to get an overview of the data structure and the types of values in each column
df_tram_tracks.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 645 entries, 0 to 644 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 geo_point_2d 645 non-null object 1 geo_shape 645 non-null object 2 descriptio 645 non-null object 3 name 645 non-null object dtypes: object(4) memory usage: 20.3+ KB
# Ensure the column is of type string
df_tram_tracks['geo_point_2d'] = df_tram_tracks['geo_point_2d'].astype(str)
# Split 'geo_point_2d' into 'latitude' and 'longitude'
df_tram_tracks[['latitude', 'longitude']] = df_tram_tracks['geo_point_2d'].str.split(', ', expand=True)
# Convert the new columns to float
df_tram_tracks['latitude'] = df_tram_tracks['latitude'].astype(float)
df_tram_tracks['longitude'] = df_tram_tracks['longitude'].astype(float)
# Drop the original 'geo_point_2d' and 'geo_shape' and 'deacriptio' columns
#Note that the description column here had data that was in html unredable format which is why it was discarded and only the name was kept
df_tram_tracks = df_tram_tracks.drop(columns=['geo_point_2d'])
df_tram_tracks = df_tram_tracks.drop(columns=['geo_shape'])
df_tram_tracks = df_tram_tracks.drop(columns=['descriptio'])
df_tram_tracks.head(5) #overview of the tram tracks dataset
| name | latitude | longitude | |
|---|---|---|---|
| 0 | kml_3 | -37.788613 | 144.934616 |
| 1 | kml_5 | -37.819186 | 144.961035 |
| 2 | kml_6 | -37.818380 | 144.959453 |
| 3 | kml_7 | -37.814404 | 144.970251 |
| 4 | kml_8 | -37.816739 | 144.969909 |
# Create a base map centered on an average location (adjust latitude and longitude for your dataset)
map_center = [df_business['latitude'].mean(), df_business['longitude'].mean()]
m = folium.Map(location=map_center, zoom_start=12)
# Add markers for each business location
for idx, row in df_business.iterrows():
folium.Marker(
location=[row['latitude'], row['longitude']],
popup=f"{row['trading_name']}<br>{row['business_address']}",
tooltip=row['trading_name'],
icon=folium.Icon(color='red', icon='info-sign') # Customize business markers
).add_to(m)
# Add markers for bus stops (assuming bus_stops DataFrame has latitude and longitude columns)
for idx, row in df_bus_stop.iterrows():
folium.Marker(
location=[row['latitude'], row['longitude']],
popup=f"Bus Stop: {row['descriptio']}",
icon=folium.Icon(color='blue', icon='bus') # Customize bus stop markers
).add_to(m)
# Add markers for tram stops (assuming tram_stops DataFrame has latitude and longitude columns)
for idx, row in df_tram_tracks.iterrows():
folium.Marker(
location=[row['latitude'], row['longitude']],
popup=f"Tram Stop: {row['name']}",
icon=folium.Icon(color='green', icon='train') # Customize tram stop markers
).add_to(m)
# Create a legend using HTML
legend_html = '''
<div style="position: fixed;
bottom: 30px; left: 30px; width: 120px; height: 100px;
background-color: white; border:2px solid grey; z-index:9999; font-size:14px;
">
<b>Legend</b> <br>
<i style="color:red;">●</i> Business <br>
<i style="color:blue;">●</i> Bus Stop <br>
<i style="color:green;">●</i> Tram Stop <br>
</div>
'''
# Add the legend to the map
m.get_root().html.add_child(folium.Element(legend_html))
# Display the map
m
The visualisation of this map integrates all available datasets, including the bars, cafes and oubs dataset, bus stops, and tram tracks, to create a comprehensive map showcasing the spatial distribution of these elements. By combining these datasets into a single map, the analysis provides an overview of the entire transportation and support service landscape in the area of interest.
In this section, we conduct an accessibility analysis of business locations within the city, focusing on their proximity to public transport stops, specifically bus and tram stations.
By employing the Haversine distance formula, we calculate the distance of each business to its nearest public transport stop. This data is then normalized to derive an accessibility score, which quantifies how easily customers can reach these businesses via public transport. A lower score indicates better accessibility, while a higher score signifies potential challenges in accessing the business. This analysis aims to identify areas with adequate transport access and those lacking it, providing insights that can inform urban planning and business strategies.
from math import radians, sin, cos, sqrt, atan2
# Function to calculate distance between two points
def haversine_distance(lat1, lon1, lat2, lon2):
# Convert latitude and longitude from degrees to radians
lat1, lon1, lat2, lon2 = radians(lat1), radians(lon1), radians(lat2), radians(lon2)
# Haversine formula
dlat = lat2 - lat1
dlon = lon2 - lon1
a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
r = 6371 # Radius of Earth in kilometers
distance = r * c
return distance
# Iterate over each business service and calculate the distance to the nearest bus stop
for index, service_row in df_business.iterrows():
nearest_distance = float('inf') # Initialize with a large value
for stop_index, stop_row in df_bus_stop.iterrows():
distance = haversine_distance(service_row['latitude'], service_row['longitude'],
stop_row['latitude'], stop_row['longitude'])
if distance < nearest_distance:
nearest_distance = distance
# Update the business location df with the nearest distance to a bus stop
df_business.at[index, 'nearest_bus_stop_distance'] = nearest_distance
# Iterate over each business service and calculate the distance to the nearest tram stop
for index, service_row in df_business.iterrows():
nearest_distance = float('inf') # Initialize with a large value
for stop_index, stop_row in df_tram_tracks.iterrows():
distance = haversine_distance(service_row['latitude'], service_row['longitude'],
stop_row['latitude'], stop_row['longitude'])
if distance < nearest_distance:
nearest_distance = distance
# Update the business location df with the nearest distance to a tram
# stop
df_business.at[index, 'nearest_tram_track_distance'] = nearest_distance
This code calculates the distance from each support service location to the nearest bus stop and tram track using the Haversine formula to account for the Earth's curvature. It iterates through each business location and calculates the distance to each bus stop and tram track, updating the DataFrame with the nearest distances. This information can be used for mapping and analysis, providing insights into the accessibility of support services via public transport.
# Normalize the nearest bus stop distance
df_business['normalized_bus_stop_distance'] = (df_business['nearest_bus_stop_distance'] - df_business['nearest_bus_stop_distance'].min()) / (df_business['nearest_bus_stop_distance'].max() - df_business['nearest_bus_stop_distance'].min())
# Normalize the nearest tram track distance
df_business['normalized_tram_track_distance'] = (df_business['nearest_tram_track_distance'] - df_business['nearest_tram_track_distance'].min()) / (df_business['nearest_tram_track_distance'].max() - df_business['nearest_tram_track_distance'].min())
# Combine these normalized distances into a single accessibility score
df_business['accessibility_score'] = (df_business['normalized_bus_stop_distance'] + df_business['normalized_tram_track_distance']) / 2
# Create a base map
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=12)
# Add business locations to the map
for idx, row in df_business.iterrows():
folium.CircleMarker(location=[row['latitude'], row['longitude']],
radius=5,
popup=f"{row['trading_name']}<br>Accessibility Score: {row['accessibility_score']:.2f}",
color='blue' if row['accessibility_score'] > 0.5 else 'red',
fill=True).add_to(melbourne_map)
# Create a custom legend
legend_html = '''
<div style="position: fixed;
top: 10px; left: 10px;
width: 150px; height: auto;
background-color: white;
border:2px solid grey;
z-index: 9999;
padding: 10px;">
<h4>Legend</h4>
<i style="background: blue; border-radius: 50%; width: 12px; height: 12px; display: inline-block;"></i> Accessibility Score > 0.5<br>
<i style="background: red; border-radius: 50%; width: 12px; height: 12px; display: inline-block;"></i> Accessibility Score ≤ 0.5<br>
</div>
'''
# Add legend to the map
melbourne_map.get_root().html.add_child(folium.Element(legend_html))
# Display the map
melbourne_map
Explanation of Accessibility Score
Low Scores (e.g., close to 0): Businesses in areas well-served by public transport, providing easy access for customers, staff, and deliveries.
High Scores (e.g., close to 1): Businesses located in areas with limited public transport access, which may deter customers or make commuting more difficult.
Interpretation of Accessibility Score Values
Value of 0.11:
An accessibility score of 0.11 suggests that the business is relatively close to public transport stops. This score implies that either the distance to the nearest bus or tram stop is significantly short, indicating good access to public transport.
Value of 0.54:
An accessibility score of 0.54 indicates that the business is farther from public transport stops. This score suggests that the business may be less accessible, meaning it is relatively far from the nearest bus or tram stop. It may require a longer walk for customers or employees to reach public transport.
# Calculate the average accessibility score by CLUE Small Area
avg_accessibility_by_area = df_business.groupby('clue_small_area')['accessibility_score'].mean().reset_index()
# Sort the results for better visualization
avg_accessibility_by_area = avg_accessibility_by_area.sort_values(by='accessibility_score', ascending=False)
# Print the results
print(avg_accessibility_by_area)
clue_small_area accessibility_score 8 Port Melbourne 0.485347 2 East Melbourne 0.326344 1 Docklands 0.311380 3 Kensington 0.186127 7 Parkville 0.185884 10 Southbank 0.184194 4 Melbourne (CBD) 0.153765 0 Carlton 0.120954 5 Melbourne (Remainder) 0.108605 11 West Melbourne (Residential) 0.106587 6 North Melbourne 0.083887 9 South Yarra 0.028589
#Plotting accessibility score by clue small area
plt.figure(figsize=(14, 6))
sns.barplot(data=avg_accessibility_by_area, x='clue_small_area', y='accessibility_score', palette='coolwarm')
plt.title('Average Accessibility Score by CLUE Small Area')
plt.xticks(rotation=45)
plt.ylabel('Average Accessibility Score')
plt.xlabel('CLUE Small Area')
plt.show()
<ipython-input-85-69e7646fbaad>:3: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(data=avg_accessibility_by_area, x='clue_small_area', y='accessibility_score', palette='coolwarm')
Explanation of results:
Top Accessible Areas:
Port Melbourne (0.485): This area has the highest accessibility score, suggesting it has a robust public transport network and is well-situated for access to bus and tram stops.
East Melbourne (0.326) and Docklands (0.311): Both areas also show good accessibility, likely due to their urban design and proximity to major transport routes.
Moderately Accessible Areas:
Kensington (0.186) and Parkville (0.185): These areas have moderate accessibility scores, indicating some level of access to public transport, but there might be room for improvement.
Lower Accessibility Areas:
Carlton (0.120) and Melbourne (Remainder) (0.108): These areas show lower scores, which might imply less access to public transport or fewer public transport stops nearby.
Least Accessible Areas:
South Yarra (0.028): With the lowest accessibility score in the list, South Yarra may be the least accessible to public transport, suggesting that residents might have to travel farther to reach bus or tram services.
#Plotting an interactive map filtered by clue small area with sccessibility scores of businesses
# Function to create a map with accessibility heatmaps
def create_map_with_heatmap(clue_area):
# Filter the DataFrame based on the selected CLUE area
filtered_df = df_business[df_business['clue_small_area'] == clue_area]
bus_stop_locations = df_bus_stop[['latitude', 'longitude']].values
tram_stop_locations = df_tram_tracks[['latitude', 'longitude']].values
# Create the map centered on the average location
map_center = [filtered_df['latitude'].mean(), filtered_df['longitude'].mean()]
m = folium.Map(location=map_center, zoom_start=13)
# Add markers for each business location
for idx, row in filtered_df.iterrows():
folium.Marker(
location=[row['latitude'], row['longitude']],
popup=f"{row['trading_name']}<br>{row['business_address']}<br>Accessibility Score: {row['accessibility_score']:.2f}",
tooltip=row['trading_name']
).add_to(m)
# Create a heatmap for business locations
HeatMap(filtered_df[['latitude', 'longitude']], radius=15).add_to(m)
# Create a heatmap for bus stops
HeatMap(bus_stop_locations, radius=15, name='Bus Stops').add_to(m)
# Create a heatmap for tram stops
HeatMap(tram_stop_locations, radius=15, name='Tram Stops').add_to(m)
# Create a heatmap for accessibility scores
# Normalize accessibility scores for better visualization in the heatmap
if not filtered_df.empty:
accessibility_heatmap_data = filtered_df[['latitude', 'longitude', 'accessibility_score']]
# Normalize accessibility scores to be between 0 and 1 for heatmap
norm_scores = (accessibility_heatmap_data['accessibility_score'] - accessibility_heatmap_data['accessibility_score'].min()) / \
(accessibility_heatmap_data['accessibility_score'].max() - accessibility_heatmap_data['accessibility_score'].min())
heatmap_data = np.array([accessibility_heatmap_data['latitude'], accessibility_heatmap_data['longitude'], norm_scores]).T
HeatMap(heatmap_data, radius=20, name='Accessibility Scores').add_to(m)
# Add a layer control
folium.LayerControl().add_to(m)
# Create a custom legend
legend_html = '''
<div style="position: fixed;
top: 10px; left: 10px;
width: 150px; height: auto;
background-color: white;
border:2px solid grey;
z-index: 9999;
padding: 10px;">
<h4>Legend</h4>
<i class="fa fa-map-marker" style="color:blue"></i> Business Location<br>
<i style="background-color: red;"> </i> Bus Stops<br>
<i style="background-color: orange;"> </i> Tram Stops<br>
<i style="background-color: green;"> </i> Accessibility Score<br>
</div>
'''
# Add legend to the map
m.get_root().html.add_child(folium.Element(legend_html))
# Display the map
display(m)
# Get the unique CLUE areas for the dropdown
clue_areas = df_business['clue_small_area'].unique()
# Use ipywidgets to create an interactive dropdown
interact(create_map_with_heatmap, clue_area=clue_areas)
interactive(children=(Dropdown(description='clue_area', options=('Melbourne (CBD)', 'Carlton', 'Parkville', 'N…
<function __main__.create_map_with_heatmap(clue_area)>
The interactive map above visually represents the accessibility of businesses based on their proximity to public transport, enabling further analysis and insight into public transport accessibility in different areas.
# Count of businesses by CLUE Small Area
business_count_by_area = df_business.groupby('clue_small_area')['trading_name'].count().reset_index()
business_count_by_area = business_count_by_area.rename(columns={'trading_name': 'business_count'})
# Merge with the accessibility scores
area_analysis = pd.merge(avg_accessibility_by_area, business_count_by_area, on='clue_small_area')
# Print the combined results
print(area_analysis)
clue_small_area accessibility_score business_count 0 Port Melbourne 0.485347 4 1 East Melbourne 0.326344 43 2 Docklands 0.311380 210 3 Kensington 0.186127 39 4 Parkville 0.185884 69 5 Southbank 0.184194 410 6 Melbourne (CBD) 0.153765 3190 7 Carlton 0.120954 293 8 Melbourne (Remainder) 0.108605 42 9 West Melbourne (Residential) 0.106587 66 10 North Melbourne 0.083887 287 11 South Yarra 0.028589 23
import matplotlib.pyplot as plt
import seaborn as sns
# Create a scatter plot
plt.figure(figsize=(30, 10))
# Create a bar plot for business count
sns.barplot(x='clue_small_area', y='business_count', data=area_analysis, color='lightblue', label='Business Count')
# Create a second y-axis for the accessibility score
plt.twinx()
sns.lineplot(x='clue_small_area', y='accessibility_score', data=area_analysis, color='orange', marker='o', label='Accessibility Score')
# Add titles and labels
plt.title('Business Count and Accessibility Score by CLUE Small Area', fontsize=16)
plt.xlabel('CLUE Small Area', fontsize=12)
plt.ylabel('Business Count', fontsize=14)
# Rotate x-tick labels to 60 degrees for better readability
plt.xticks(rotation=60, ha='right', fontsize=5)
# Optional: Adjust layout to give more space for tick labels
plt.subplots_adjust(bottom=0.2)
# Add legend
plt.legend(loc='upper left')
# Show the plot
plt.tight_layout()
plt.show()
Explanation:
Business Count:
Melbourne (CBD) has the highest business count (3190), significantly outnumbering other areas. This indicates a concentration of commercial activity, which may be due to factors like high foot traffic, amenities, and accessibility.
Accessibility Scores:
Port Melbourne has the highest accessibility score (0.485347) but only has 4 businesses. This suggests that while the area may be well-connected to public transport, it lacks commercial establishments.
Melbourne (CBD) has a lower accessibility score (0.153765) but an exceptionally high business count (3190). This may indicate that while accessibility is not as high, the sheer volume of businesses compensates for this, or other factors drive business presence in this area. Trends and Patterns:
Areas with higher accessibility scores like Port Melbourne and East Melbourne have relatively fewer businesses compared to Southbank and Melbourne (CBD). This suggests that while accessibility is crucial, it may not be the only factor influencing the number of businesses.
Areas with a lower accessibility score may still have a high business count if they are in strategic locations that attract foot traffic, tourists, or specific demographics. Potential Outliers:
Southbank shows a strong balance of accessibility (0.184194) and a high business count (410). This could indicate a successful urban design and transport integration.
Docklands has a moderate accessibility score (0.311380) but a significant number of businesses (210), showing that it is somewhat well-positioned.
In this section we are assessing the walkability of the areas surrounding bars and pubs to public transport by checking if the business locations fall within a walkability range of 500ms and the number of businesses closest to bus stops and the number of businesses closest to tram stops.
from math import radians, sin, cos, sqrt, atan2
# Function to calculate the haversine distance between two coordinates
def haversine_distance(lat1, lon1, lat2, lon2):
# Convert latitude and longitude from degrees to radians
lat1, lon1, lat2, lon2 = radians(lat1), radians(lon1), radians(lat2), radians(lon2)
# Haversine formula to calculate the distance
dlat = lat2 - lat1
dlon = lon2 - lon1
a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
r = 6371 # Radius of the Earth in kilometers
distance = r * c # Distance in kilometers
return distance * 1000 # Convert to meters
# Define the walking distance threshold (500 meters)
walking_distance_threshold = 200
# Create columns to store if a business is within walking distance of a bus or tram stop
df_business['within_200m_bus'] = False
df_business['within_200m_tram'] = False
# Iterate over each business and check proximity to bus stops
for index, business in df_business.iterrows():
for _, bus_stop in df_bus_stop.iterrows():
distance = haversine_distance(business['latitude'], business['longitude'],
bus_stop['latitude'], bus_stop['longitude'])
if distance <= walking_distance_threshold:
df_business.at[index, 'within_200m_bus'] = True
break # No need to check further bus stops for this business
# Iterate over each business and check proximity to tram stops
for index, business in df_business.iterrows():
for _, tram_stop in df_tram_tracks.iterrows():
distance = haversine_distance(business['latitude'], business['longitude'],
tram_stop['latitude'], tram_stop['longitude'])
if distance <= walking_distance_threshold:
df_business.at[index, 'within_200m_tram'] = True
break # No need to check further tram stops for this business
# Now, let's calculate the total number of businesses within 500 meters of any public transport stop
df_business['within_200m_any'] = df_business['within_200m_bus'] | df_business['within_200m_tram']
# Get the count of businesses within walking distance of public transport
within_200m_count = df_business['within_200m_any'].sum()
# Print the results
print(f"Total number of businesses within 200 meters of any public transport stop: {within_200m_count}")
print(f"Businesses near bus stops: {df_business['within_200m_bus'].sum()}")
print(f"Businesses near tram stops: {df_business['within_200m_tram'].sum()}")
Total number of businesses within 200 meters of any public transport stop: 4475 Businesses near bus stops: 2323 Businesses near tram stops: 4142
Explanation:
The high total of businesses within 200 meters of public transport indicates a well-connected urban area where access to public transport is not a barrier for customers. This accessibility can lead to increased customer visits, supporting local economies and enhancing business performance.
Comparative Analysis:
The higher number of businesses near tram stops compared to bus stops might suggest that tram routes are more strategically placed or that trams are more heavily utilized in this area. It could also indicate a preference among the population for tram travel over buses, which may be due to factors like speed, frequency, or route convenience.
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 8))
# Plot bus stops
plt.scatter(df_bus_stop['longitude'], df_bus_stop['latitude'], color='blue', label='Bus Stops', alpha=0.5)
# Plot tram stops
plt.scatter(df_tram_tracks['longitude'], df_tram_tracks['latitude'], color='red', label='Tram Stops', alpha=0.5)
# Plot businesses and color based on accessibility
plt.scatter(df_business['longitude'], df_business['latitude'],
color=df_business['within_200m_any'].map({True: 'green', False: 'gray'}),
label='Businesses (Green: Accessible, Gray: Not Accessible)', alpha=0.5)
plt.title('Businesses and Public Transport Stops')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.legend()
plt.grid()
plt.show()
The scatter plot indicates that most of the businesses are highly accessibile to public transport by walking. We can also see some grey dots indicating just a few that are within the 200m walkability range. By focusing on both the accessible and less accessible areas, the analysis can drive strategic planning initiatives that ensure equitable access to public transport and support local businesses.
This use case provides a thorough analysis of the relationship between public transport accessibility and the locations of bars, pubs, and taverns in Melbourne. By integrating the datasets, including business locations and public transport stops, we have gained valuable insights into how accessibility influences business patronage in the city.
Key findings from the analysis include:
Accessibility Scores: The calculated accessibility scores reveal areas with varying levels of access to public transport. These insights allow stakeholders to pinpoint underserved neighborhoods that may benefit from enhanced transport options, thereby improving overall accessibility. For instance, areas like Port Melbourne with an accessibility score of 0.485347 demonstrated the highest access to public transport, while regions such as South Yarra had an accessibility score of 0.028589, indicating significant room for improvement.
Business Count Analysis: The analysis also highlighted the count of businesses in relation to accessibility scores. Areas with higher accessibility, such as Melbourne (CBD) with 3,190 businesses, showcased a strong correlation between transport access and business density. This trend illustrates that businesses are more likely to thrive in locations with convenient public transport options, while regions with fewer transport connections tend to have fewer businesses, such as South Yarra, which has only 23 businesses despite being a popular area.
Walkability Analysis: This involved examining whether business locations fall within a 200-meter walkability range from bus and tram stops. The analysis indicates the number of businesses nearest to these public transport options, emphasizing the importance of walkable environments in fostering customer patronage. Businesses within a 200-meter radius of public transport stops were found to be more numerous, reinforcing the idea that accessibility directly impacts customer traffic.
Visualization: The development of interactive maps with the accessibility scores and a filter based on clue small area serves as a clear visual representation of business locations in relation to bus and tram stops. This tool facilitates informed decision-making for urban planners and business owners regarding optimal sites for new establishments.
Recommendations:
Impact on Business Density: The findings indicate a strong correlation between transport access and business density, with areas boasting higher accessibility scores generally housing a greater concentration of bars and pubs. The visual representation on the map, particularly the prevalence of blue zones indicating high accessibility, reinforces this relationship.
Strategic Planning: The insights garnered from this analysis can inform urban planners' decisions regarding public transport service enhancements and potential new business establishments. Additionally, these findings can support local government initiatives aimed at fostering economic development.
In conclusion, this use case serves as a valuable resource for promoting economic growth and ensuring equitable access to public transport across Melbourne. The predominance of areas with high accessibility scores underscores the importance of continued investment in public transport infrastructure to sustain the vitality and expansion of local businesses. By prioritizing accessibility, stakeholders can contribute to a thriving urban environment that benefits both residents and business owners alike.